The Efficiency of Histogram-like Techniques for Database Query Optimization
نویسندگان
چکیده
One of the most difficult tasks in modern day database management systems is information retrieval. Basically, this task involves a user query, written in a high-level language such as the Structured Query Language, and some internal operations, which are transparent to the user. The internal operations are carried out through very complex modules that decompose, optimize and execute the different operations. We consider the problem of Query Optimization which consists of the system choosing, among many different query evaluation plans (QEPs), the most economical one. Since the number of QEPs increases exponentially as the number of relations involving the query increases, query optimization is a very complex problem. Many estimation techniques have been developed in order to approximate the cost of a QEP. Histogram-based techniques are the most used methods in this context. In this paper, we discuss the efficiency of some of these methods: Equi-width, Equi-depth, the Rectangular Attribute Cardinality Map (R-ACM) and the Trapezoidal Attribute Cardinality Map (T-ACM). These methods are used to estimate the cost of the different QEP, whence they attempt to determine the optimal one. It has been shown that the errors of the estimates from R-ACM and T-ACM are significantly less than the corresponding errors obtained from Equi-width and Equi-depth. This fact has been formally demonstrated using reasonable statistical distributions for the cost of a QEP, the doubly exponential distribution and the normal distribution. For the empirical analysis, we have developed a formal, rigorous prototype model used to analyze these methods on random databases. Our empirical results demonstrate that R-ACM chooses a superior QEP more than two times as often as Equi-width and Equi-depth. Similar results have been obtained for T-ACM when compared to the traditional methods. Indeed, in the most general scenario, we analytically prove that under certain models the better the accuracy of an estimation technique, the greater the probability of choosing the most efficient QEP.
منابع مشابه
Relational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملRectangular Attribute Cardinality Map: A New Histogram-like Technique for Query Optimization
Current database systems utilize histograms to approximate frequency distributions of attribute values of relations. These are used to efficiently estimate query result sizes and access plan costs. Even though they have been in use for nearly two decades, there has been no significant mathematical techniques (other than those used in statistics for traditional histogram approximations) to study...
متن کاملUsing histograms to estimate answer sizes for XML queries
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in “next-step” decisions, such as query refinement. This paper proposes a technique to obtain query result size estimates effectively in an XM...
متن کاملEstimating Answer Sizes for XML Queries
Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in “next-step” decisions, such as query refinement. This paper proposes a technique to obtain query result size estimates effectively in an XM...
متن کاملAn Empirical Comparison of Histogram-Like Techniques for Query Optimization
We consider the problem of Query Optimization which consists of a database system choosing, among many diierent Query Evaluation Plans (QEP), the most economical one for a given query. Since the number of QEPs increases exponentially with the number of relations involving the query, query optimization is a very complex problem. Many estimation techniques have been developed in order to approxim...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. J.
دوره 45 شماره
صفحات -
تاریخ انتشار 2002